Skip to content

Conversation

@will-rice
Copy link
Contributor

@will-rice will-rice commented Apr 4, 2024

What does this PR do?

This is a fix for an issue I opened earlier today. I'll link it down below.

Fixes # (issue)
#7575

Before submitting

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

Comment on lines +507 to +513
else:
conditioning = self.transformer_blocks[0].norm1.emb(
timestep, class_labels, hidden_dtype=hidden_states.dtype
)
shift, scale = self.proj_out_1(F.silu(conditioning)).chunk(2, dim=1)
hidden_states = self.norm_out(hidden_states) * (1 + scale[:, None]) + shift[:, None]
hidden_states = self.proj_out_2(hidden_states)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this practically the same as what we're doing when norm_type == "ada_norm"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is similar, but ada_norm doesn't take class labels as an argument. I moved this to else because the original was if norm_type != ada_norm_single. From what I can tell that block still only supports the norm used in the original DiT implementation. It might be worth it to refactor and allow other norm types.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can then still condition on if class_labels is not None or something like that no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If class_labels is None then you can't use AdaLayerNormZero. I'm not sure what the default norm should be when you want to condition on text or audio, but I picked AdaLayerNorm because it was similar to the zero variant without needing class labels.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about ada norm single i.e., the one used in PixArt Alpha?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you use that one without additional arguments you get this error:

TypeError: PixArtAlphaCombinedTimestepSizeEmbeddings(
  (time_proj): Timesteps()
  (timestep_embedder): TimestepEmbedding(
    (linear_1): Linear(in_features=256, out_features=1408, bias=True)
    (act): SiLU()
    (linear_2): Linear(in_features=1408, out_features=1408, bias=True)
  )
) argument after ** must be a mapping, not NoneType

It requires the arugments resolution and aspect_ratio, but those could have a default of None because they aren't required when use_additional_conditions=False

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think changing that block would be more appropriate?

@yiyixuxu WDYT?

Comment on lines 46 to 47
scale, shift = torch.chunk(emb, 2, dim=1)
x = self.norm(x) * (1 + scale[:, None, :]) + shift[:, None, :]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please explain why this set of changes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default argument for dim in torch.chunk is dim=0 so it was splitting on the batch dimension. The second line change is to broadcast the scale and shift to the correct shape. This is exactly what is done elsewhere in the implementation that scale and shift are used. I could change it to scale[:, None] and shift[:, None] to match the other places it is used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually that caused some of the tests to fail. I will take a look at that.

Copy link
Contributor Author

@will-rice will-rice Apr 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the test that fails. Is it intended for AdaLayerNorm to not support batch size? If you add a batch dimension to this test it passes with my changes.

timestep_1 = torch.tensor(1, dtype=torch.long).to(torch_device)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably because of how the integration was done. During that process, the no.1 priority is to get the model integrated in the library. So, exhaustivity isn't at all prioritized.

Copy link
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. Left some comments.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@will-rice
Copy link
Contributor Author

Something I want to add is that I'm not partial to ada_norm here. I just want to use DiT without needing class labels or the additional arguments for PixArtAlphaCombinedTimestepSizeEmbeddings.

@github-actions
Copy link
Contributor

github-actions bot commented May 5, 2024

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@github-actions github-actions bot added the stale Issues that haven't received updates label May 5, 2024
@sayakpaul
Copy link
Member

Gently pinging @yiyixuxu

@github-actions github-actions bot removed the stale Issues that haven't received updates label Sep 14, 2024
@github-actions
Copy link
Contributor

github-actions bot commented Oct 9, 2024

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@github-actions github-actions bot added the stale Issues that haven't received updates label Oct 9, 2024
@will-rice will-rice closed this Oct 16, 2024
@pranayj77
Copy link

Was this ever fixed?

@pranayj77
Copy link

@will-rice @sayakpaul

@will-rice
Copy link
Contributor Author

I'm not sure if this specifically was fixed, but there are several DiT models to choose from now.

@pranayj77
Copy link

Got it thanks, will use the DiTTransformer2DModel or something else

@will-rice
Copy link
Contributor Author

That's what I ended up doing.

@wufeim
Copy link

wufeim commented Sep 9, 2025

Doesn't DiTTransformer2DModel still ask for class_labels as ada_norm is not properly implemented or tested?

# Validate inputs.
if norm_type != "ada_norm_zero":
    raise NotImplementedError(
        f"Forward pass is not implemented when `patch_size` is not None and `norm_type` is '{norm_type}'."
    )
elif norm_type == "ada_norm_zero" and num_embeds_ada_norm is None:
    raise ValueError(
        f"When using a `patch_size` and this `norm_type` ({norm_type}), `num_embeds_ada_norm` cannot be None."
    )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

stale Issues that haven't received updates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants